There's no 'Count or Predict' but task-based \\selection for distributional models

نویسندگان

  • Martin Riedl
  • Christian Biemann
چکیده

In this paper, we investigate the differences between prediction-based (word2vec), dense countbased (GloVe) and sparse count-based (JoBimText) semantic models. We evaluate the models, which were selected because they can all be computed efficiently on large data, based on word similarity tasks and a semantic ranking task both for verbs and nouns. We demonstrate that prediction-based models yield higher scores than the other two models at determining a similarity score between two words. To the contrary, sparse count-based methods perform best in the ranking task. Further, sparse count-based methods benefit more from linguistically informed contexts, such as dependency relations. In summary, we highlight differences of popular distributional semantic representations and derive recommendations for their usage.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors

Context-predicting models (more commonly known as embeddings or neural language models) are the new kids on the distributional semantics block. Despite the buzz surrounding these models, the literature is still lacking a systematic comparison of the predictive models with classic, count-vector-based distributional semantic approaches. In this paper, we perform such an extensive evaluation, on a...

متن کامل

A corpus-based evaluation method for Distributional Semantic Models

Evaluation methods for Distributional Semantic Models typically rely on behaviorally derived gold standards. These methods are difficult to deploy in languages with scarce linguistic/behavioral resources. We introduce a corpus-based measure that evaluates the stability of the lexical semantic similarity space using a pseudo-synonym same-different detection task and no external resources. We sho...

متن کامل

A corpus-based evaluation method for Distributional Semantic Models

Evaluation methods for Distributional Semantic Models typically rely on behaviorally derived gold standards. These methods are difficult to deploy in languages with scarce linguistic/behavioral resources. We introduce a corpus-based measure that evaluates the stability of the lexical semantic similarity space using a pseudo-synonym same-different detection task and no external resources. We sho...

متن کامل

بررسی تأثیر پنج عامل شخصیت بر عضویت نوجوانان در فیسبوک

Introduction: Nowadays, Facebook as a social networking site is one of the most popular hobbies of cyberspace among adolescents and young people. Tendency or reluctance to Facebook is determined by personality traits of the user. Method: To investigate the effect of big five personality factors on the membership of adolescents on Facebook, 350 students (175 male students and 175 female student...

متن کامل

Modeling Semantic Plausibility by Injecting World Knowledge

Distributional data tells us that a man can swallow candy, but not that a man can swallow a paintball, since this is never attested. However both are physically plausible events. This paper introduces the task of semantic plausibility: recognizing plausible but possibly novel events. We present a new crowdsourced dataset of semantic plausibility judgments of single events such as “man swallow p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017